Feature Engineering Bookcamp by Sinan Ozdemir

Feature Engineering Bookcamp by Sinan Ozdemir

Author:Sinan Ozdemir
Language: eng
Format: mobi
Publisher: Manning Publications Co.
Published: 2022-08-24T22:00:00+00:00


❶ Using a custom tokenizer

❷ Not needed anymore, as our tokenizer is removing stop words and is lowercasing

Our results (figure 5.19) show a reduction in performance, like we saw with our text cleaning.

Figure 5.19 Our stemmer is not showing a boost in performance, which implies that the tokens we were trying to remove had enough signal in them to lower our pipeline’s performance.

It looks like both of our feature improvement techniques did not show a boost in performance, but this is OK! They were both worth trying, and it reveals a deeper truth about our data.

It’s tempting when working with text data to get frustrated when basic feature engineering techniques don’t work, but context seems to really matter here, and this is often true in NLP cases. In our next few chapters, we will start to move away from interpretable features that represent individual tokens in our text and more towards latent features—features that represent a hidden structure of data that is more complex than bag-of-words.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.